-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Arc runner computeenv #16750
base: dev
Are you sure you want to change the base?
Draft: Arc runner computeenv #16750
Conversation
Adding pyarcrest conditional requirement for the ArcRESTJobRunner
ArcRESTJobRunner
Require version 0.2 which is compatible with the ARC job runner.
…cessary job_actions method. Further improvements to the action on jobs will come.
…e. And rewriting input-paths.
return | ||
""" prepare_job() calls prepare() but not allowing to pass a compute_environment object | ||
As I need to define my own compute_environment for the remote compute I must call it here passing the compute_environment | ||
TODO - not a good solution""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm facing a similar problem with the DIRAC jobrunner and thought of modifying the prepare_job
to take an extra argument compute_environment
(default None
), so it should keep working with existing implementations but allow for customisation
Thanks for working on this, we need it so much in our astroparticle physics domains! |
This PR is related to #16653
It is based on the same branch, but I am now using the tool_script.sh as the executable that ARC runs.
I also demonstrate a very brute-force and simplistic way of using the compute_environment to rewrite the input paths in order for ARC to get the correct paths for the command-line when running on the remote site (that does not have a shared directory with the galaxy server).
I am still using a custom ARC tool - to avoid complications with complex path resolutions like e.g. this tool: https://github.com/galaxyproject/tools-iuc/blob/main/tools/minimap2/minimap2.xml where there are softlinks:
ln -f -s '/storage/galaxy/data/datasets/4/5/c/dataset_45cf1bcb-28b1-4167-878d-1fb17636064e.dat' reference.fa && minimap2 --q-occ-frac 0.01 -t ${GALAXY_SLOTS:-4} reference.fa '/storage/galaxy/data/datasets/4/5/c/dataset_45cf1bcb-28b1-4167-878d-1fb17636064e.dat' -a | samtools view --no-PG -hT reference.fa | samtools sort -@${GALAXY_SLOTS:-2} -T "${TMPDIR:-.}" -O BAM -o '/storage/galaxy/data/jobs/000/228/outputs/dataset_03e29ceb-8733-4836-8981-90391c58105f.dat'
in the command-line created - and I do not currently know how to solve this.The custom test prototyping tool is:
This tool demonstrates uploading of local galaxy files to the remote ARC server. Also downloading of all output from the job on the ARC side to galaxy (however note, the pattern for finding the ARC logs does not work currently, wrong pattern, the files are present in the galaxy job dir though).
In addition, the tool - just for testing purposes - expects a file with a list of remote input files that ARC on the remote server side will download there, not involving Galaxy at all. This is not the solution that we will end up with, but it is for demonstrating the datastaging capability in ARC.
The tool is tested using the following file-list file:
Also, for testing, upload a bash script that is called by tool_script.sh for example
The tool_script.sh produced by Galaxy is:
Once the job is done the workdir of the galaxy job will look like this:
That contains all the ARC logs, in addition to the jobs output files.
This version of the ARC runner depends on pyarcrest 0.3: https://pypi.org/search/?q=pyarcrest (temporarily uploaded for testing purposes, later it will be included in the nordugrid ARC distro http://www.nordugrid.org/arc/arc6/common/repos/repository.html)
How to test the changes?
(Select all options that apply)
For the galaxy admin:
For the user:
To progress making this closer to production ready
License